Conversation
|
pulled this down to verify with my examples a few notes:
04_scaling_performance/01_autoscaling/gpu_worker.py configures scaling strategies: scale_to_zero_config = LiveServerless( This controls how autoscaling decides to add workers — QUEUE_DELAY scales based on how long jobs wait in queue, Endpoint() doesn't have these params, so there's no way to express this: What we'd want:@endpoint(name="worker", gpu=GpuGroup.ANY, workers=(0, 3), Could we add scaler_type / scaler_value (or a combined scaler= param)?
template = PodTemplate( gpu_config = ServerlessEndpoint( Endpoint(image=) only takes the image name string. The other template features — dockerArgs (e.g. shared memory size), Could we either add a template= param that accepts a PodTemplate, or surface these as top-level kwargs on Endpoint?
Two examples use @Remote on a class for stateful workers. Here's the pattern from @Remote(resource_config=gpu_config, dependencies=["diffusers", "torch", "transformers"]) The class is instantiated once when the worker boots. The model stays in GPU memory via self.pipe and every request With function-based @endpoint, there's no self to hold state: @endpoint(name="worker", gpu=GpuGroup.ANY, dependencies=["diffusers", "torch"]) Does @endpoint(...) support decorating classes the same way @Remote does? If not, we'd need a workaround (module-level
The PR's skeleton templates use GpuType: @endpoint(name="gpu_worker", gpu=GpuType.ANY, dependencies=["torch"]) But existing examples all use GpuGroup: @endpoint(name="worker", gpu=GpuGroup.ADA_24) Both work — Endpoint(gpu=) accepts either. But they mean different things: GpuType is a specific GPU model (e.g. RTX
|
will work on adding those parameters
👍
Endpoint does support classes
we should prefer GpuType in simpler examples, since it is easier to understand, but expand to GpuGroup for situations when more scale is important |
QA ReportStatus: WARN CI StatusAll 6 Quality Gates pass (Python 3.10–3.14 + Build Package). No CI regressions detected.
PR Scope
Test File Summary
PR Diff Analysis
Observations & Issues1. Dual-purpose methods create subtle API surface ep = Endpoint(name="my-api")
ep.post("/compute") # returns a decorator
ep_client = Endpoint(id="x")
ep_client.post("/compute") # returns a coroutineThe distinction is tested but the boundary between "no data arg = decorator" vs "data=None = client call" is not explicitly tested. A user calling 2. 3. Endpoint with Test Quality AssessmentStrengths:
Missing Coverage:
Suggested Improvements:
Review Comments IntegrationThe PR already addresses the 4 review items from @runpod-Henrik:
RecommendationMERGE WITH NOTES The PR is solid — 161 tests, CI green on all Python versions, comprehensive coverage of the new Endpoint API. The dual-purpose method design and
Generated by flash-qa agent |
@remote and ServerlessResource-based classes
@remote and ServerlessResource-based classes@remote and ServerlessResource-based classes
There was a problem hiding this comment.
Pull request overview
This PR introduces a unified Endpoint facade as the single user-facing API for Flash endpoints, consolidating the previous @remote decorator + multiple resource config classes into one entrypoint while keeping the underlying provisioning/handler pipeline intact via internal resource-config unwrapping.
Changes:
- Added
runpod_flash.endpoint.EndpointandEndpointJobto support QB (decorator) mode, LB (route registration) mode, and client mode (id=/image=). - Updated scanning/discovery/manifest generation and skeleton templates/docs to recognize and showcase the new
Endpointpatterns. - Marked legacy
@remoteand legacy resource config classes as deprecated (warnings + compatibility import paths), and added extensive unit tests.
Reviewed changes
Copilot reviewed 30 out of 31 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_skeleton_endpoint.py | Verifies skeleton templates use Endpoint (not @remote / legacy classes). |
| tests/unit/test_endpoint_client.py | Adds unit tests for client-mode calls and EndpointJob lifecycle methods. |
| tests/unit/test_endpoint.py | Adds broad unit coverage for Endpoint construction, mode inference, and internal config selection. |
| tests/unit/test_discovery_endpoint.py | Ensures ResourceDiscovery recognizes Endpoint LB patterns and resolves to deployable resources. |
| tests/unit/test_deprecations.py | Ensures deprecation warnings are emitted for legacy imports/usages. |
| tests/unit/runtime/test_resource_provisioner.py | Extends resource provisioner tests for manifest resource_type="Endpoint". |
| tests/unit/cli/commands/test_run_endpoint.py | Tests worker scanning + server generation for Endpoint QB/LB patterns. |
| tests/unit/cli/commands/build_utils/test_scanner_endpoint.py | Adds comprehensive scanner tests for Endpoint QB/LB patterns and edge cases. |
| tests/unit/cli/commands/build_utils/test_manifest_endpoint.py | Tests manifest-building and deployment-config extraction for Endpoint resources. |
| src/runpod_flash/runtime/resource_provisioner.py | Adds manifest-time mapping from Endpoint to underlying resource classes and extracts scaler fields. |
| src/runpod_flash/endpoint.py | Implements Endpoint + EndpointJob, including internal resource-config selection and client methods. |
| src/runpod_flash/core/discovery.py | Extends discovery to detect Endpoint LB usage (ep = Endpoint(...) + @ep.get/post/...). |
| src/runpod_flash/client.py | Deprecates remote() (warning) and adds _internal flag for Endpoint internals. |
| src/runpod_flash/cli/utils/skeleton_template/lb_worker.py | Updates LB skeleton template to use Endpoint route registration. |
| src/runpod_flash/cli/utils/skeleton_template/gpu_worker.py | Updates GPU QB skeleton template to @Endpoint(...) and adds a small main test harness. |
| src/runpod_flash/cli/utils/skeleton_template/cpu_worker.py | Updates CPU QB skeleton template to @Endpoint(...) and adds a small main test harness. |
| src/runpod_flash/cli/utils/skeleton_template/README.md | Updates template README to document QB/LB/client usage via Endpoint. |
| src/runpod_flash/cli/commands/run.py | Updates CLI “no workers found” guidance to show Endpoint examples. |
| src/runpod_flash/cli/commands/build_utils/scanner.py | Adds Endpoint QB/LB AST detection and metadata emission. |
| src/runpod_flash/cli/commands/build_utils/manifest.py | Unwraps Endpoint when extracting deployment config + adds scaler fields. |
| src/runpod_flash/cli/commands/_run_server_helpers.py | Unwraps Endpoint in LB execution helper before provisioning. |
| src/runpod_flash/init.py | Exposes Endpoint/EndpointJob and adds deprecation warnings for legacy names. |
| docs/Using_Remote_With_LoadBalancer.md | Rewritten to describe LB endpoints via Endpoint rather than @remote + LB classes. |
| docs/Load_Balancer_Endpoints.md | Updates docs to position Endpoint as user-facing API, legacy classes as internal. |
| docs/LoadBalancer_Runtime_Architecture.md | Updates runtime docs terminology and examples to Endpoint. |
| docs/GPU_Provisioning.md | Updates examples to use Endpoint patterns and scaler params. |
| docs/Flash_SDK_Reference.md | Updates SDK reference to Endpoint as primary API + adds EndpointJob section. |
| docs/Flash_Deploy_Guide.md | Updates deployment guide terminology and diagrams to Endpoint. |
| docs/Deployment_Architecture.md | Updates architecture doc to reflect Endpoint-based scanning/manifest. |
| docs/Cross_Endpoint_Routing.md | Updates routing doc examples to use Endpoint. |
| .gitignore | Adds /.pi ignore entry. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
runpod-Henrik
left a comment
There was a problem hiding this comment.
Code Review — PR #223: Unified Endpoint Class
Nice refactor unifying 8 resource config classes + @remote into a single Endpoint facade. The API design is clean and the test coverage is solid (+2,688 lines of tests). Found a few issues:
Bug 1 (HIGH): _normalize_workers accepts negative and inverted values
endpoint.py — _normalize_workers()
def _normalize_workers(workers):
if workers is None:
return (0, 1)
if isinstance(workers, int):
return (0, workers) # workers=-5 → (0, -5)
if isinstance(workers, (tuple, list)) and len(workers) == 2:
return (int(workers[0]), int(workers[1])) # (-3, -1) acceptedEndpoint(name="x", workers=-5)→(0, -5)silently acceptedEndpoint(name="x", workers=(10, 2))→min > maxsilently acceptedEndpoint(name="x", workers=(3.99, 7.01))→ silently truncated to(3, 7)byint()
Fix: Add validation:
min_w, max_w = ...
if min_w < 0 or max_w < 0:
raise ValueError(f"workers cannot be negative: ({min_w}, {max_w})")
if min_w > max_w:
raise ValueError(f"workers min ({min_w}) cannot exceed max ({max_w})")Bug 2 (MEDIUM): No duplicate route detection in _route()
endpoint.py — _route()
def _route(self, method: str, path: str):
# ...validation...
def decorator(func):
self._routes.append({"method": method, "path": path, ...})Routes are appended to a list with no duplicate check. Two functions registered on @api.post("/predict") silently coexist — last one wins at runtime.
Fix: Check before appending:
existing = [(r["method"], r["path"]) for r in self._routes]
if (method, path) in existing:
raise ValueError(f"duplicate route: {method} {path}")Bug 3 (MEDIUM): No reserved path validation
endpoint.py — _route()
The docs explicitly list /execute and /ping as reserved paths, but _route() doesn't block them. @api.post("/execute") or @api.get("/ping") would collide with framework endpoints.
Fix: Add to _route():
_RESERVED_PATHS = frozenset({"/execute", "/ping"})
if path in _RESERVED_PATHS:
raise ValueError(f"path {path} is reserved by the framework")Bug 4 (MEDIUM): Scanner hardcodes is_live_resource=True for all Endpoint patterns
scanner.py — _build_endpoint_qb_metadata() and _build_endpoint_route_metadata()
Both methods hardcode is_live_resource=True. This means flash deploy (non-live provisioning) would still see is_live_resource=True, potentially using the wrong resource class (Live* instead of deploy-time *Endpoint).
The _build_resource_config() in endpoint.py correctly calls _is_live_provisioning() at runtime, but the scanner metadata is set statically at scan time.
Fix: Either defer the flag or set it dynamically:
is_live_resource=_is_live_provisioning(), # or leave as None, let downstream decideBug 5 (MEDIUM): get()/post() silently ignore data in decorator mode
endpoint.py — get()
def get(self, path: str, data: Any = None, **kwargs):
if self.is_client:
return self._client_request("GET", path, data, **kwargs)
return self._route("GET", path) # data silently droppedIn decorator mode, data and **kwargs are silently ignored. A user writing @api.get("/health", data={"key": "val"}) gets no error — the data is silently dropped.
Fix: In decorator mode, validate no extra args:
if data is not None or kwargs:
raise TypeError(
"data and kwargs are only valid in client mode (Endpoint with id= or image=). "
"In decorator mode, use @api.get('/path') with no data argument."
)Bug 6 (LOW): EndpointJob.wait() can overshoot timeout
endpoint.py — EndpointJob.wait()
while not self.done:
if deadline is not None and time.monotonic() >= deadline:
raise TimeoutError(...)
await asyncio.sleep(interval) # sleeps THEN checks status
await self.status() # network call, can take arbitrary time
interval = min(interval * _POLL_BACKOFF_FACTOR, _POLL_MAX_INTERVAL)The deadline check happens before asyncio.sleep(interval) + the network call to status(). With _POLL_MAX_INTERVAL=5.0, the actual wait can overshoot by 5s + network latency. Not critical since it's a best-effort timeout, but worth documenting or checking deadline after the sleep too.
Bug 7 (LOW): Discovery string match "Endpoint" too broad
discovery.py — content check
if "@remote" not in content and "Endpoint" not in content:
continue # skip fileThis matches any file containing the substring "Endpoint" anywhere — comments, docstrings, variable names like api_endpoint_url. This is a pre-filter so false positives just mean extra AST parsing (performance, not correctness), but could be tightened.
Overall this is well-structured. The main concerns are Bug 1 (negative workers will propagate to the API and either error cryptically or create broken endpoints) and Bug 2/3 (route collisions are a common user error that should be caught early).
runpod-Henrik
left a comment
There was a problem hiding this comment.
Follow-up Review — PR #223
Nice work addressing the feedback from the first review. 5 of 7 original bugs are now fixed with tests. Here's where things stand:
Previously reported — now FIXED (thank you!)
- ✅ Bug 1:
_normalize_workersnegative/inverted validation (commit 2b7e838) - ✅ Bug 2: Duplicate route detection (commit 2b7e838)
- ✅ Bug 3: Reserved path validation (commit 2b7e838)
- ✅ Bug 5:
get()/post()data silently dropped in decorator mode (commit 2b7e838) - ✅
_ClientCoroutinewrapper gives clear errors when using client endpoints as decorators (commit 7cf0cf8) - ✅ R2 presigned URL auth header fix in
upload_build()(commit c4cf791)
Still open from original review
Bug 4 (MEDIUM): Scanner hardcodes is_live_resource=True
_build_endpoint_qb_metadata(), _build_endpoint_route_metadata(), and _register_endpoint_variable() all hardcode is_live_resource=True. During flash deploy, the scanner metadata would claim live mode even when deploying. The runtime _build_resource_config() correctly calls _is_live_provisioning(), but the scanner metadata doesn't match.
Likely low blast radius since the manifest extraction unwraps Endpoint and calls _build_resource_config() which does the right thing — but the scanner metadata is misleading.
New finding
_ensure_endpoint_ready() caches URL for image= mode regardless of lb flag
# image= mode: use cached result
if self._endpoint_url is not None:
return self._endpoint_url # always returns first-cached formatFor id= mode this is handled correctly (resolves fresh each time). But for image= mode, if the first call is QB-style (ep.run(...) → path URL), then an LB-style call (ep.get("/path") → subdomain URL) returns the wrong format from cache.
Fix: cache both formats, or re-derive from the deployed ID:
if self._endpoint_url is not None:
if lb:
return self._resolve_lb_url(self._deployed_id)
return self._resolve_qb_url(self._deployed_id)Overall
This is in great shape — the validation, error messages, and test coverage are significantly improved since the first review. The _ClientCoroutine wrapper is a particularly nice touch for catching the decorator-on-client-endpoint mistake. The scaler/template additions round out the feature set.
The is_live_resource hardcoding and URL caching bug are the only remaining concerns — neither is a blocker but the URL caching could cause confusing behavior for image= endpoints that make both QB and LB calls.
flash-singh0
left a comment
There was a problem hiding this comment.
looks good, bugs raised by copilot are low priority
Unified Endpoint API
Replaces 8 resource config classes (
LiveServerless,CpuLiveServerless,LiveLoadBalancer,CpuLiveLoadBalancer,ServerlessEndpoint,CpuServerlessEndpoint,LoadBalancerSlsResource,CpuLoadBalancerSlsResource) and the@remotedecorator with a single Endpoint class.Fixes AE-2259
Fixes AE-2306
Queue-based
Load-balanced
Client mode
What changed